Goto

Collaborating Authors

 Imperial County


AI use in American newspapers is widespread, uneven, and rarely disclosed

Russell, Jenna, Karpinska, Marzena, Akinode, Destiny, Thai, Katherine, Emi, Bradley, Spero, Max, Iyyer, Mohit

arXiv.org Artificial Intelligence

AI is rapidly transforming journalism, but the extent of its use in published newspaper articles remains unclear. We address this gap by auditing a large-scale dataset of 186K articles from online editions of 1.5K American newspapers published in the summer of 2025. Using Pangram, a state-of-the-art AI detector, we discover that approximately 9% of newly-published articles are either partially or fully AI-generated. This AI use is unevenly distributed, appearing more frequently in smaller, local outlets, in specific topics such as weather and technology, and within certain ownership groups. We also analyze 45K opinion pieces from Washington Post, New York Times, and Wall Street Journal, finding that they are 6.4 times more likely to contain AI-generated content than news articles from the same publications, with many AI-flagged op-eds authored by prominent public figures. Despite this prevalence, we find that AI use is rarely disclosed: a manual audit of 100 AI-flagged articles found only five disclosures of AI use. Overall, our audit highlights the immediate need for greater transparency and updated editorial standards regarding the use of AI in journalism to maintain public trust.


Coupling Agent-based Modeling and Life Cycle Assessment to Analyze Trade-offs in Resilient Energy Transitions

Zhang, Beichen, Zaki, Mohammed T., Breunig, Hanna, Ajami, Newsha K.

arXiv.org Artificial Intelligence

Transitioning to sustainable and resilient energy systems requires navigating complex and interdependent trade-offs across environmental, social, and resource dimensions. Neglecting these trade-offs can lead to unintended consequences across sectors. However, existing assessments often evaluate emerging energy pathways and their impacts in silos, overlooking critical interactions such as regional resource competition and cumulative impacts. We present an integrated modeling framework that couples agent-based modeling and Life Cycle Assessment (LCA) to simulate how energy transition pathways interact with regional resource competition, ecological constraints, and community-level burdens. We apply the model to a case study in Southern California. The results demonstrate how integrated and multiscale decision making can shape energy pathway deployment and reveal spatially explicit trade-offs under scenario-driven constraints. This modeling framework can further support more adaptive and resilient energy transition planning on spatial and institutional scales.


Prisoner gunned down outside MacArthur Park facility for state inmates nearing release

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. The California Department of Corrections and Rehabilitation operates a reentry facility across the street from MacArthur Park. Two inmates living at the facility were shot, one fatally, on Sept. 2. Voice comes from the use of AI. Please report any issues or inconsistencies here . One man was killed and another wounded outside a facility for state prisoners serving out the remainder of their sentences in the community.


QuesGenie: Intelligent Multimodal Question Generation

Mubarak, Ahmed, Ahmed, Amna, Nasser, Amira, Mohamed, Aya, El-Sadek, Fares, Ahmed, Mohammed, Salah, Ahmed, Sobhy, Youssef

arXiv.org Artificial Intelligence

--In today's information-rich era, learners have access to abundant educational resources, but the lack of practice materials tailored to these resources presents a significant challenge. This project addresses that gap by developing a multimodal question generation system that can automatically generate diverse question types from various content formats. This project lays the foundation for automated, scalable, and intelligent question generation, carefully balancing resource efficiency, robust functionality and a smooth user experience. Creating assessment questions is a time-consuming and labor-intensive task for educators. Traditional methods require manual extraction of information from materials, which can lead to inconsistencies and errors. Additionally, students often struggle to find varied practice questions that cover all aspects of the material they are studying. With the increasing use of multimedia in educational content, there is a growing need for systems that can process various data types, including text, diagrams, and audio recordings.


Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs

Yang, Hongming, Lin, Shi, Shao, Jun, Lin, Changting, Zhu, Donghai, Han, Meng, Kong, Qinglei

arXiv.org Artificial Intelligence

Lightweight Large Language Models (LwLLMs) are reduced-parameter, optimized models designed to run efficiently on consumer-grade hardware, offering significant advantages in resource efficiency, cost-effectiveness, and data privacy. However, these models often struggle with limited inference and reasoning capabilities, which restrict their performance on complex tasks and limit their practical applicability. Moreover, existing prompt optimization methods typically rely on extensive manual effort or the meta-cognitive abilities of state-of-the-art LLMs, making them less effective for LwLLMs. To address these challenges, we introduce DeBoP, a new Direct Behavior Optimization Paradigm, original from the Chain-of-Thought (CoT) prompting technique. Unlike CoT Prompting, DeBoP is an automatic optimization method, which focuses on the optimization directly on the behavior of LwLLMs. In particular, DeBoP transforms the optimization of complex prompts into the optimization of discrete, quantifiable execution sequences using a gradient-free Monte Carlo Tree Search. We evaluate DeBoP on seven challenging tasks where state-of-the-art LLMs excel but LwLLMs generally underperform. Experimental results demonstrate that DeBoP significantly outperforms recent prompt optimization methods on most tasks. In particular, DeBoP-optimized LwLLMs surpass GPT-3.5 on most tasks while reducing computational time by approximately 60% compared to other automatic prompt optimization methods.


The Muon Space GNSS-R Surface Soil Moisture Product

Roberts, Max, Colwell, Ian, Chew, Clara, Masters, Dallas, Nordstrom, Karl

arXiv.org Artificial Intelligence

Muon Space (Muon) is building a constellation of small satellites, many of which will carry global navigation satellite system-reflectometry (GNSS-R) receivers. In preparation for the launch of this constellation, we have developed a generalized deep learning retrieval pipeline, which now produces operational GNSS-R near-surface soil moisture retrievals using data from NASA's Cyclone GNSS (CYGNSS) mission. In this article, we describe the input datasets, preprocessing methods, model architecture, development methods, and detail the soil moisture products generated from these retrievals. The performance of this product is quantified against in situ measurements and compared to both the target dataset (retrievals from the Soil Moisture Active-Passive (SMAP) satellite) and the v1.0 soil moisture product from the CYGNSS mission. The Muon Space product achieves improvements in spatial resolution over SMAP with comparable performance in many regions. An ubRMSE of 0.032 cm$^3$ cm$^{-3}$ for in situ soil moisture observations from SMAP core validation sites is shown, though performance is lower than SMAP's when comparing in forests and/or mountainous terrain. The Muon Space product outperforms the v1.0 CYGNSS soil moisture product in almost all aspects. This initial release serves as the foundation of our operational soil moisture product, which soon will additionally include data from Muon Space satellites.


DISCO: DISCovering Overfittings as Causal Rules for Text Classification Models

Zhang, Zijian, Setty, Vinay, Wang, Yumeng, Anand, Avishek

arXiv.org Artificial Intelligence

With the rapid advancement of neural language models, the deployment of over-parameterized models has surged, increasing the need for interpretable explanations comprehensible to human inspectors. Existing post-hoc interpretability methods, which often focus on unigram features of single input textual instances, fail to capture the models' decision-making process fully. Additionally, many methods do not differentiate between decisions based on spurious correlations and those based on a holistic understanding of the input. Our paper introduces DISCO, a novel method for discovering global, rule-based explanations by identifying causal n-gram associations with model predictions. This method employs a scalable sequence mining technique to extract relevant text spans from training data, associate them with model predictions, and conduct causality checks to distill robust rules that elucidate model behavior. These rules expose potential overfitting and provide insights into misleading feature combinations. We validate DISCO through extensive testing, demonstrating its superiority over existing methods in offering comprehensive insights into complex model behaviors. Our approach successfully identifies all shortcuts manually introduced into the training data (100% detection rate on the MultiRC dataset), resulting in an 18.8% regression in model performance -- a capability unmatched by any other method. Furthermore, DISCO supports interactive explanations, enabling human inspectors to distinguish spurious causes in the rule-based output. This alleviates the burden of abundant instance-wise explanations and helps assess the model's risk when encountering out-of-distribution (OOD) data.


AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

Mousi, Basel, Durrani, Nadir, Ahmad, Fatema, Hasan, Md. Arid, Hasanain, Maram, Kabbani, Tameem, Dalvi, Fahim, Chowdhury, Shammur Absar, Alam, Firoj

arXiv.org Artificial Intelligence

Arabic, with its rich diversity of dialects, remains significantly underrepresented in Large Language Models, particularly in dialectal variations. We address this gap by introducing seven synthetic datasets in dialects alongside Modern Standard Arabic (MSA), created using Machine Translation (MT) combined with human post-editing. We present AraDiCE, a benchmark for Arabic Dialect and Cultural Evaluation. We evaluate LLMs on dialect comprehension and generation, focusing specifically on low-resource Arabic dialects. Additionally, we introduce the first-ever fine-grained benchmark designed to evaluate cultural awareness across the Gulf, Egypt, and Levant regions, providing a novel dimension to LLM evaluation. Our findings demonstrate that while Arabic-specific models like Jais and AceGPT outperform multilingual models on dialectal tasks, significant challenges persist in dialect identification, generation, and translation. This work contributes ~45K post-edited samples, a cultural benchmark, and highlights the importance of tailored training to improve LLM performance in capturing the nuances of diverse Arabic dialects and cultural contexts. We will release the dialectal translation models and benchmarks curated in this study.


Crafting Large Language Models for Enhanced Interpretability

Sun, Chung-En, Oikarinen, Tuomas, Weng, Tsui-Wei

arXiv.org Artificial Intelligence

We introduce the Concept Bottleneck Large Language Model (CB-LLM), a pioneering approach to creating inherently interpretable Large Language Models (LLMs). Unlike traditional black-box LLMs that rely on post-hoc interpretation methods with limited neuron function insights, CB-LLM sets a new standard with its built-in interpretability, scalability, and ability to provide clear, accurate explanations. This innovation not only advances transparency in language models but also enhances their effectiveness. Our unique Automatic Concept Correction (ACC) strategy successfully narrows the performance gap with conventional black-box LLMs, positioning CB-LLM as a model that combines the high accuracy of traditional LLMs with the added benefit of clear interpretability -- a feature markedly absent in existing LLMs.


Aligning Teacher with Student Preferences for Tailored Training Data Generation

Liu, Yantao, Zhang, Zhao, Yao, Zijun, Cao, Shulin, Hou, Lei, Li, Juanzi

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown significant promise as copilots in various tasks. Local deployment of LLMs on edge devices is necessary when handling privacy-sensitive data or latency-sensitive tasks. The computational constraints of such devices make direct deployment of powerful large-scale LLMs impractical, necessitating the Knowledge Distillation from large-scale models to lightweight models. Lots of work has been done to elicit diversity and quality training examples from LLMs, but little attention has been paid to aligning teacher instructional content based on student preferences, akin to "responsive teaching" in pedagogy. Thus, we propose ARTE, dubbed Aligning TeacheR with StudenT PreferencEs, a framework that aligns the teacher model with student preferences to generate tailored training examples for Knowledge Distillation. Specifically, we elicit draft questions and rationales from the teacher model, then collect student preferences on these questions and rationales using students' performance with in-context learning as a proxy, and finally align the teacher model with student preferences. In the end, we repeat the first step with the aligned teacher model to elicit tailored training examples for the student model on the target task. Extensive experiments on academic benchmarks demonstrate the superiority of ARTE over existing instruction-tuning datasets distilled from powerful LLMs. Moreover, we thoroughly investigate the generalization of ARTE, including the generalization of fine-tuned student models in reasoning ability and the generalization of aligned teacher models to generate tailored training data across tasks and students. In summary, our contributions lie in proposing a novel framework for tailored training example generation, demonstrating its efficacy in experiments, and investigating the generalization of both student & aligned teacher models in ARTE.